Alignments and String Similarity in Information Integration: A Random Field Approach

نویسندگان

  • Mikhail Bilenko
  • Raymond J. Mooney
چکیده

Several problems central to information integration, such as ontology mapping and object matching, can be viewed as alignment tasks where the goal is to find an optimal correspondence between two structured objects and to compute the associated similarity score. The diversity of data sources and domains in the Semantic Web requires solutions to these problems to be highly adaptive, which can be achieved by employing probabilistic machine learning approaches. We present one such approach, Alignment Conditional Random Fields (ACRFs), a new framework for constructing and scoring sequence alignments using undirected graphical models. ACRFs allow incorporating arbitrary features into string edit distance computation, yielding a learnable string similarity function for use in tasks where approximate string matching is needed. We outline possible applications of ACRFs in information integration tasks and describe directions for future work.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FCICU at SemEval-2017 Task 1: Sense-Based Language Independent Semantic Textual Similarity Approach

This paper describes FCICU team systems that participated in SemEval-2017 Semantic Textual Similarity task (Task1) for monolingual and cross-lingual sentence pairs. A sense-based language independent textual similarity approach is presented, in which a proposed alignment similarity method coupled with new usage of a semantic network (BabelNet) is used. Additionally, a previously proposed integr...

متن کامل

Cluster-Based Image Segmentation Using Fuzzy Markov Random Field

Image segmentation is an important task in image processing and computer vision which attract many researchers attention. There are a couple of information sets pixels in an image: statistical and structural information which refer to the feature value of pixel data and local correlation of pixel data, respectively. Markov random field (MRF) is a tool for modeling statistical and structural inf...

متن کامل

Measuring the Structural Similarity of Web-based Documents: A Novel Approach

Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so c...

متن کامل

Microsoft Word - CONTENTS-AUGUST07

Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so c...

متن کامل

Determining the structure and map of vegetation of Mirabad protected area (Iran) using DEM and Geographic Information Systems (GIS)

The Mirabad protected area (S. Azarbaijan, Iran) has a variety of ecological nurseries due to elevation of the sea, physiographic factors, micro-climates and soil types, and has high vegetation diversity. Mirabad protected area in the Piranshahr-Sardasht axis is between the latitudes of 36° 23' and 36° 31' north, and the lengths 45° 15' and 45° 25', with an area of ​​11435 ha, in the elevation ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005